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ABSTRACT 


NLP is a processor being developed at the Naval Post- 
graduate School for research in natural language man-machine 
communication. With this system text can be translated into 
an entity-attribute-value information structure, and such a 
structure can be translated into text. These two processes, 
called decoding and encoding, respectively, are specified by 
writing "rules" in a language designed for this system. 

This thesis reports on a scheme for storing these rules 
in the computer in a compact fashion, and describes the 
related routines. The savings in core storage and CPU time 
achieved by using this scheme are given for a particular 


appa cation of NLP. 
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IA INTRODUCTION 


The increased use of computers to solve problems is due 
to the greater availability of computing machinery and the 
associated increase in reliability, speed and accuracy. 

One dominant factor which seems to discourage some users 

From computer applications is that man-machine interaction 

is normally accomplished through a programming language. 
Thus, there is a requirement for familiarity with such a 
language or, as more often is the situation, for having 
programming personnel perform the interaction for the user. 
The second method mentioned is rapidly becoming intolerable, 
for often a communication gap develops between the programmer 
and the user. Also, the cost of programming personnel ap- 
pears to be increasing as rapidly as other computer operating 
costs are decreasing, and this trend is expected to continue 
mo the near future. 

Pesolution to the spiraling cost problem is to automate 
ewe programming function: One possibility for doing this is 
Eounave the computer become a natural language processor. 
Such a processor could accept natural language statements 
and questions as input, utilize syntactic and semantic in- 
formation to translate the input into an internal data 
SUmuc-gEE cand then from this produce a computer program to 
Sebve ene stated problem, The difficulty in creating such 


a processor is that most natural languages are ambiguous and 





imprecise in their structure and are not readily adaptable 

ae mputer application. But, within the last few years 

ehe fields of artificial intelligence and linguisitics have 
met with some success in formally describing natural languages 
such as English. Some examples are Noam Chomsky's Theory of 
Tranformational Grammar [1] and Sydney Lamb's Stratification- 


al Grammar [2]. 


A. A NATURAL LANGUAGE PROCESSOR 

Currently there is a research project at the Naval Post- 
Eucduate School on a natural language processor called NLP 
[3,4]. This system provides a rule language and associated 
processors for "decoding" natural language text into an 
Internal Problem Description (IPD) and for "encoding" an 
IPD into text in some natural language or some programming 
language. 

The current Sesion OF NLP 15 one for Producing GESS 
simulation programs for simple queuing problems. This ap- 
plication of NLP is referred to as NLPQ. The objective of 
NLPQ is to enable an analyst to solve simple queuing problems 
By ecesceribing the problem to the computer: in English and 
receiving as output from NLPQ a GPSS program. 

Work on NLPQ has been reported in a number of masters 
theses. Reference 5 describes an Internal Problem Description 
(IPD) for storing simple queuing problems and a procedure 
which encodes the IPD into a GPSS program. Reference 6 ex- 
tended the GPSS encoding procedures and provided additional 


encoding procedures which translate the IPD into an equivalent 





Pasi aescription Of the queuing problem. Reference 7 
supplemented NLPQ with an interactive question answer scheme 
for generating the IPD. Reference 8 added an interrogator 
Poceinspecting the IPD to insure that a proper GPSS program 
will be produced. 

The programming language used for NLP is FORTRAN IV, and 
the program runs under the CP/CMS time sharing system on 


an IBM 360/67 computer. 


Pee tHE STS OBJECTIVE 

The research objective of this thesis was to develop a 
Seneme for storing the compiled decoding and encoding rules 
SENLLE in a more compact form. A secondary goal was to do 
this in such a way as to reduce the amount of “paginas 
performed by CP/CMS and thereby reduce the CPU time required 


to execute NLP. 


or ORGANIZATION OF THESIS 

erion IL or this report describes the MAIN routine of 
NLP, some of the available NLP commands, parameters, switches 
and some printing commands available to the user. ET 
ii eports on the compilation of NLP rules and the proces- 
sing of named record definitions. Section IV describes the 
decoding and encoding processes and the operation of the NLP 
interpreter (CRSEG). Section V discusses the savings in core 
storage and CPU time achieved, and section VI contains the 
conclusions and some recommendations for future NLP research. 

In order to understand this thesis a familiarity with the 


material in Refs. 3 and 4 is necessary. Listings of the 





FORTRAN program are available from Professor George E. Heidorn 


Gemene Naval Postgraduate School. 


II. MAIN NLP PROGRAM 


Before NLP can process input text, it must compile the 
rules and named record definitions which specify a partic- 
there NLP application. The information obtained by performing 
gps function is stored in the CELL array and in the A-array. 
These arrays and the information they contain are referred 
Mas the Information Storage Structure (ISS) in this thesis. 
The ISS does not include information obtained while processing 
Euput text (i.e. the IPD) The CELL array contains the named 
records and segment type records, while the A-array contains 
the compiled rules. The CELL array is described in Ref. 4. 
The A-array will be discussed in detail in the next two 
chapters. 

This section will discuss in general terms how the NLP 
Procram initially sets-up the ISS, and also the functions of 
some of the parameters and switches which the user can set. 
Because of the importance of being able to look at information 
actually stored in the ISS, a discussion of print commands is 


also presented. 


A NE MAIN ROUTINE 

The NLP program has five basic sections, named NLP, PRNAMS, 
DECODE, ENCODE and LPR. The main program which starts the 
system is in the NLP section and is referred to as MAIN routine 


> 


a NEE MAIN A flow chart of the MAIN routine execution is 





shown in Figure 1. The function of NLP MAIN is to ini- 
tialize variables, the CELL array and the A-array, process 
Lie commands, and store the ISS in an output file. 
Initialization is accomplished either internally or from 
a previously written file. An example of how the user inter- 
acts with the system during initialization and transferring 
Buche ISS to an output file is included in Ref. 3. Once 
initialization is completed the system is ready to accept 
NLP commands. A list of commands and their function is 
presented in Appendix A. A command must begin in column 1 
and end with a colon. The specific command determines which 


koutine NLP MAIN will transfer control to. 


B. PARAMETERS AND SWITCHES 

NLP has a number of parameters and switches which can be 
set in a NAMELIST statement to control program execution. 
Switches are variables which can have values of "true" or 
"false", and parameters are variables which can take on other 
values. A listing and description of parameters and switches 
is contained in Appendix B and Appendix C respectively. The 
purpose of these variables is to allow the user to alter 
program execution to a small degree, and to obtain tracings 
of program execution. Parameter and switch variables can be 
set whenever the program requests optional data. A sample 


reply to such a request is: 


&p prtsw-t, out6-8 &end 


10 





START 


Petree READ OPTIONAL DATA; READ 
PROMSTERMINAL THE NUMBER OF INITIAL 
Pie INE OUT PILE. IF FLNUMB=0 
INITIALIZE INTERNALLY, ELSE FROM 
BINARY FILE. 





PERADENUMBER OF NEXT INPUT FILE. 


iE SE LNUMB=O; READ OUTPUT FILE NUMBER. 


OTHERWISE 


mem READ NESE LINE; 
{IF NEW FLNUMB=0 | E REQUEST OR 


WHEADING LINE IS 







OTHERWISE, WRITE MISSING 
OUT BINARY FILE. | OTHERWISE 





IF REQUEST OR HEADING LINE IS AN END 
OF tI LE AND SAME NIS TRUE 








KP MENSEN 





| OTHERWISE, REQUEST IS RECOGNIZED. 
POE C UTETPROPERTROUTINES sA WLIH 
MNEU PROM SPECIFIEDEPILE 






IF NOT MAXLN, UCELL OR OPDATA COMMAND 


OTHERWISE 





Figure 1. NLP MAIN Processing 


yi 





The characters "g£p" must be preceeded by a blank space. The 
example shown causes the print switch to be set and designates 
file 8 as the output file for any write statements having 

OWTG as their output file parameter. Certain parts of the 

NLP program allow only certain parameters and switches to 

be set. The program listing must be consulted to determine 
when such specific variables can be set. Tracings of program 
execution are especially helpful in "debugging" revisions 


mma additions to NLP. 


ee PRINT COMMAND 
ines orrne command can be used to print selected infor- 
Maeron Stored in the ISS (Information Storage Structure). 


Print command format and examples are 


COMMANDS: 

PRINT SEgmentype [name], [level-number]: 
PRINT | REcord [number], [level-number]: 
PRINT INdicator Gone es 
PRINT ATtribute [name], [level-number]: 
PRINT ROutine [name], [level-number]: 
PRINT * [name], [level-number]: 
PRINT MEmory, [level-number]: 
PRINT ['named-record-name'],[level-number]: 
PRINT ARray [number]: 

PRINT ARray [number]- [number]: 


r? 





EXAMPLES: 


PRINT SE VERB: 
PRINT ARRAY 1 - 200: 
PRINT AR 52: 


ERUNT “AGENLIST*,2: 


When specifying the type of information to be printed, only 
the first two characters need be entered after the print 
command. Hexadecimal numbers are indicated by "z" following 
the number. The level-number is optional and specifies to 
what level of detail the output should be. For the last 
example above, the named record for 'ACTNLIST' and any records 
pointed to by 'ACTNLIST' are printed. 

Printing of A-array information is a feature which was 
implemented as part of the work done for this thesis. A 
detailed discussion of the A-array content is presented in 
the next section, where examples will show what information 


isestored in the A-array. 
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III. COMPILING OF RULES AND NAMED RECORD DEFINITIONS 


The rules and named record definitions are the elements 
which specify a unique NLP application. An example of such 
an application is NLPQ. The NLP program is language-independ- 
ent, but a specific set of rules determines how NLP will 
meoecess input and output. Thus, the first function of NLP 
is to compile and process rules. 

Reference 4 describes in detail the rule and named 
definitions of NLP. There are basically two types of rules, 


decoding and encoding. An example of a decoding rule is: 
VERBS (ED) E D -> VERBE (SUP (VERBS) | PASTPART PAGIE) 
An example of an encoding rule is: 
VERBP (PASTPART) --» VERBS (SUP(VERBP)) E D 


Bach cule has a "left" part and a "right" part, separated by 
an "arrow". The purpose of a rule is to specify the grammar 
and conditions under which segment types on the right-part 
Will be created after all the conditions on the left-part are 
satisfied. 

Named record definitions provide information about words 
and concepts which the system can recognize and process. An 


example of a named record definition is: 


WAIT  ('ACTIVITY',NSFX,S,ING,ED,ER) 
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Rules and named record definitions are compiled and 
processed by the PRNAMS section of NLP. Within PRNAMS are 
miyew routines PRULES, PRNREC, PRCNLB, GETSYM and CODE.  PRULES 
processes the rules, and PRNREC processes the named record 
G@efinitions. Both make use of PRCNLB, GETSYM and CODE. 

Rules are stored in the A-array and the named records are 
stored in the CELL array. The segment type records produced 
during rule processing are stored in the CELL array. 

Familiarity with the concept of segment type records 
is important for an understanding of rule processing. Thus, 
this section begins with a brief discussion of segment type 
Récords. The compiling process of a rule will be described 
lexplarning what effects PRULES, PRCNLB, GETSYM and CODE 
have on a rule. The processing of named record definitions 
is also described because the same routines which process the 
Bules, except for PRULES, also process the attribute 


we ications of named record definitions. 


Pe SEGMENT TYPE RECORDS 

The names of the elements on the left and right parts of 
a rule are called segment types. The segment types on the 
Heft side of a rule usually contain conditions in parentheses, 
and when the input to the rules satisfies all the conditions 
on the left then segment records are created according to the 
segment types and their related creation specifications on 
the right side. The creation specifications are in paren- 


theses. Segment type records are stored in the CELL array 
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in an entity-attribute-value fashion as described in Ref- 
erence 4. Appendix D lists the attributes of segment type 


records. 


eee PROCESSING OF RULES (PRULES) 

PRULES 12S the routine that processes the decoding and 
encoding rules. Basically, PRULES has three parts, which 
fee initialization, a left-part processor, and a right-part 
processor. The input to PRULES is a set of rules to be 
compiled and stored in the A-array. 

ER TRETA array Structure 

The physical structure of the A-array is shown in 

Fiqure 2, and the conceptual structure is shown in Figure 
3, for a sample rule. The A-array stores the compiled de- 
Seating and encoding rules for a particular NLP application. 
These rules do not change during execution of the NLP program 
and thus can be compiled and stored as compactly as possible, 
keeping in mind economic retrieval of the A-array contents. 
The A-array 1S a REAL*8 one-dimensional array of 5000 elements. 
This particular physical structure was chosen because the 
Ioirgest IBM 360 fortran variable is REAL*8 (64 BITS), and 
the addressing inira cron oO INTEGER*2 Subseripts 12. 32, 760. 
All addresses in NLP require at most two bytes (because 
both the CELL array and the A-array have less than 32,768 
elements). he information in the A-array is stored in byte 
format using eight bytes per element. This provides for 


maximum storage capacity of approximately 256k bytes. 
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1: 
2 
3 
4 
1224 


2014 
DONS 


5000 


A- ARRAY 


* 

00 05 00 01 39 FF 08*FB ROLE 

FF 00 00 00 00 00 00 00 

00 91 00 01 39 FF 08 FB KE 

6F 01 FF 00 00 00 00 00 

04 D2 C5 09 86 C5 0A 0C E—-XATPT 

| | | (TEMPORARY WORD 
LES o 

00 00 FF OC BY 1E C6 OC || LAST RULE 

3F 47 OE FF 00 00 00 00 |< XATWRD 

| (TOP WORD USED) 

er oom 00100000 090 005 

00 00 00 00 00 00 00 00 } 

eS Say eS 
REAL* 8 (64 BITS OR 8 BYTES) 


DIMENSION (5000) 


* ENTRIES ARE HEXADECIMAL NUMBERS 


Frere Xcphwesacod Structure and Storage or 
the A-Array 





(706) 


a. / 


NUMBER 
Gn 
decimal) 


(707) 


(708) 


(2269) 


(5000) 


A-ARRAY (40000 BYTES) 


=" 















BEGINNING OF FIRST RULE 


AUW N PP 


> XALINK (LINK FIELD) 


AS TICO STETUENT CONDITION SPECIFICATIONS 


SEPARATOR FOR CONSTITUENTS 
SI ERZORFSSESOND -CONSTITUENT ON LEFT 


SECOND CONSTITUENT CONDITION 
SPECIFICATIONS 

ARROW (-->>), SEPARATING LEFT AND RIGHT 

STYPE PARTS OPF A RULE. 


OF PIER STICONSTITUENT OP RIGGET 


vun 


CREATION SPECIFICATIONS FOR FIRST 
CONSTITUENT ON RIGHT 
END OF RULE 


030 UM ia YN FOO -<JDd0U PB YN PO SINN DB WN 


XATWRD (TOP WORD OF STORAGE) 


AMBATO (OPA TE OF STORAGE) 





SAMPLE RULE STORED: 
ADV(S'FILLER') VERBPH( NOUNAL) --> 
VERBPH (PRM) 





queres conceptual Strtcture and Storage of 


the A-Array 
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Mee PN NE PENE Of the A-array 1S a 
LOGICAL*1 (one byte) one-dimensional array of 40,000 bytes. 
The conceptual structure will be used to describe the storage 
ERUdEretrreval of information from the A-array. 

The dimension statements of the A-array and the CELL 
array can eaSily be changed within NLP MAIN. Besides chang- 
ing the dimension statements, MXCELL (maximum subscript of 
the CELL array) and XMAXA (maximum subscript of the A-array) 
must also be changed. Thus, only four statements in the 
MAIN routine need be changed. 

A eE Part Rule Processor 
The following sample decoding rule from NLPQ will be 


used for illustrative purposes in this section: 


VERB (MODAL) VERBPH(INF) --? 


VERBPH (PRM,MODAL=MODAL (VERB) , VFORM=VFORM (VERB) , INTERG) 


The rule has two segment types on the left of the arrow (NLPO 
encoding rules have only one segment type on the left) and 
en son the right. Each of the segment types on the left has 
condition specifications in parentheses and the segment type 
on the right has creation specifications in parentheses. 
Figure 4A depicts in block diagram format the processing of 
the left-part of a rule and Figure 4B does the same for the 
jc t= Pal ts 

PRULES begins compiling rules by first setting some 
varvables hand One OL the first variables to be set is XDESW 
(dEcod ina switch) « A XDESW Value of "true" indicates) that 


decoding rules are to be processed, and a value of "false" 
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EVRY Se RULES 
DETERMINE ih PROCESSING DECODING/ENCODING RULES 


MEE AS ARRAY POR RULE TO BE PROCESSED 
tia CDUNTERED, EXIT FROM PRULES. 


IF -->ENCOUNTERED, GO TO RIGHT-PART PROCESSOR 


IF NOT FIRST CONSTITUENT, ENTER INST=0 


CHECK IP CONE PUR CONSTITUENT OF TYPE 2 


GET NAME AND SEGMENT TYPE ADDRESS 
IF NOT FIRST CONSTITUENT OF A RULE | 


IAS THE FIRST RULE BEGINNING WITH 
CURRENT SEGMBNT TYPE, STORE A POINTER IN 
SEMEN etree RECORD, POINTING TO THE A-ARRAY 
POTRE OTNT FOR CURRENT RULE. 








OTHERWISE, MAKE XALINK ENTRY 


MAKE APPLICABLE ENTRIES INTO THE A-ARRAY, 
BEL ALSS ARRAY AND ALSSPT ARRAY, 





PROCESS CONDITION SPECIFICATIONS 





Vos eb ORI CONTE ZTUAL CONSTITUENT. OF TYPE 
l OR FOR A STAR AND A NUMBER. 
MAKE PROPER A-ARRAY ENTRY. 


Figure 4A. Prules Entry and Left-Part Rule Processing 
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ENTRY TO RIGHT-PARI RULE PROCESSOR 











ENTER INST=255 FOR ARROW SEPARATING 
TERIEAND RIGHT PARTS’ OF AT RULE 






TESNEXPZRIPUNZECHZARACTERFIS AN END OF 
FILE OR THE BEGINNING OF TIE NEXT RULE 





OTHERWISE, CONTINUE PROCESSING RIGHT 
STE OA Ud 





MER NST EENDEN elo NOt Tih FERST ON 
MIESRIGIT SITDE-OF Å RULE, ENTER 
INST=0. 


COLLECT SEGMENT TYPE NAME AND CHECK IF 
II LS THE Shin AS A. SEGMENT TYPE NAME 
ONE LEFT SIDEN Ir SO, (SET NCC. 






| GET SEGMENT TYPE ADDRESS | 


PROCESS ANY STAR AND NUMBER 
COMBINATION FOLLOWING A RIGHT SIDE 


NAME 





ENTER STYPE VALUE 


a 


ee 


iS NEG NOT EON TOROSEZNTER INSTRUCTION 
BOR AUTOMATIC COPY INTO THE A-ARRAY. 






PROCESS CONDITION SPECIFICATION 








ENTER INST=255 FOR AN END OF A RULE 
AND PROCESS NEXT RULE, OR ENTER INST= $ 
255 FOR AN END OF FILE AND EXIT. 


Figure 4B. Right-Part Rule Processing 
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means that encoding rules are to be processed. Next, the 
first line of the first rule is read. As each new rule is 
read, it is assigned a rule-number (the value of XATWRD 
which is a pointer to the top word used in the A-array). 
For each rule, storage begins in a new word (element) of 
mae A-array. 

Once the rule-number has been assigned, the left-part 
of the rule can be processed. The name of the first con- 
stituent is "collected" and stored in NMS (name variable). 
Then the identifying number (address) of the segment type 
record for the constituent must be determined. If the name 
associated with NMS is equal to that associated with NMSS 
(name variable for name of first constituent from the pre- 
ceeding rule processed) then STYPE (segment type record address 
for NMS) is equal to XCSTYP (segment type record address for 
NMSS). Else, a search must be made by scanning all segment 
type record names of rules already stored which begin with 
emeziirst Character of the name stored in NMS. If the search 
indicates that this is the first occurance of such a segment 
type then a record of this segment type is created in the ISS 
BEmiormatıon Storage Structure).  STYPE = set to the address 
of the newly created record. For all segment types which are 
the first constituent of a rule the content of attribute XATRI 
KE ocwrionsdecodumco and 8 før encoding) is assigned to the 
variable XLRULE. If XLRULE (address of last rule of same 
segment type) has a value of zero then a pointer is entered 


which points to a word in the A-array where the first rule 
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beginning with that segment type will be stored. Also, the 
variables NMSS, XCSTYP (segment type address for NMSS value) 
and XLRULE are reset to conform to the constituent currently 
being processed. If XLRULE contains a pointer to an element 
of the A-array then the current rule being processed is not 
the first rule beginning with such a segment type. For this 
case, the link field (first two bytes of a rule stored in the 
Mmeattay) of the last rule processed for this segment type 
must be located and the rule-number of the present rule is 
inserted. XALINK is the link field variable of a rule. After 
a link has been made the NMSS, XCSTYP and XLRULE are reset. 

DaLconstirtuecutscof- a rule have their STYPE value 
entered into the A-array prior to the processing of their 
endi tion or creation specifications. This entry is ac- 
complished with a call to CODE uSing an argument of -4, 
the one exception being the first constituent of a rule. 

Prior sto. processing conditions the NMS and STYPE 
values are entered into the XLSS (left side segment type 
name) array and the XLSSPT (left side segment type pointer) 
array respectively. 

The rule-number of the sample le is 02B6 (694 
decimal) and the first A-array word of the previous rule 


with the same segment type is: 


027B6 7PB=05766. 00 OUSTBA 


with the link field entry (first two bytes) pointing to the 


rule currently being processed. 
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At this stage of processing, a check for a left 
parenthesis is made. A positive result indicates that the 
constituent has conditions which must be compiled. PRCNLB, 
using GETSYM and CODE, acts as a compiler and is called to 
process these condition specifications. After compiling the 
first constituent of the sample rule, the content of the 


A=array for the constituent is: 
D000 VCD eis IG 00000000 


While the left-part of a rule is being processed, 
two checks are performed on decoding rules to determine if 
mm@enconstituent being processed is a contextual constituent. 
There are two types of contextual constituents. A type l 
is indicated by a slash (/) appearing after a constituent in 
the rule. A slash appearing next to the beginning of a 
constituent signifies a type 2 contextual constituent. The 
emecks are performed as the rule is scanned from left-to- 
mEght, and any contextual information is entered into the 
array with a call to CODE using an argument of -3. The 
variable NUMB contains the contextual type value. 

A constituent can also be followed by a star (*) 
and a number. If the number is missing then a 3 is used as 
the default value. Such information is entered into the 
A-array with the same call to CODE as for contextual in- 
Korea on, Ber constıtuent can not have both a slash and a 
star associated with it. The variable NUMB is set to the 
munbertollowing. the star.” The’call to CODE is made after 


The Condition Specifications, if any, have been processed. 
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When a constituent has been processed, program 
execution returns to the beginning of the left-part processor. 
nsesenext group of characters in the rule are located and if 
the group is an arrow then control is switched to the right- 
Ieee processor. Otherwise, the next constituent of the rule 
has been located, as is the case for the example, and a zero- 
value byte is entered into the A-array with a call to CODE 
using an argument of -2. The zero-value byte has the purpose 
of separating constituents. The second and any following 
Bonstjituents are processed much like the first except that 
DER S, XCSTYP, and XLRULE are not reset, and linking is not 
performed. 

When the arrow is encountered, the left-part of a 
rule has been processed and control of program execution Is 
transferred to the right-part processor. The first action 
Sm hne right-part processor is to enter the value 255 into 
the A-array to indicate the separation between the two parts. 
This is accomplished with a call to CODE using an argument 
ees). 

ter processing he Teit- part "ot a rule, tne A-array 


content for the sample rule is: 


VOS CD Ps 16 00 O09 BA 


82 21:0 771,2 0020020020020 


3. Right-Part Rule Processor 
heri near processor Initially Perrorns The same 
functions as the left-part processor. That is, the NMS 


(segment type name) of the constituent must be collected and 
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its STYPE (segment type address) determined. Since the NMS 
value of a constituent on the right can equal one already 
Proc ssel on-the left, the XLSS array is searched first. If 
the search is successful, the STYPE value is obtained from 
the corresponding XLSSPT array entry. Otherwise, a search of 
Segment type records, the same as performed by the left-part 
Processor, is made. If no match is found, a record for the 
segment type is created in the Information Storage Structure 
ENOUESTYPE is set to point to the new record. If the segment 
type name is followed by a star and a number, the value of 
MEL (attribute four) of the record pointed to by STYPE, is 
set to the number. 

FN cOonscituents on the right have their STYPE value 
entered into the A-array. Also, for any NMS value equal to 
EuEentryv in the XLSS array a special entry (instruction code 
mes) made in the A-array. This special entry specifies 
that the particular constituent being processed is to be a 
copy of some constituent on the left. For decoding rules a 
number 1S inserted into the A-array which specifies which 
Konsti tuent is tO be copied during execution of the rule. 
This information is entered with a call to CODE using an 
argument Of 1, For encoding rules there is only one con- 
stituent on the left and thus is is not necessary to enter 
a number. 

Popaeneodendgeeulcsstheslast constituent on the right 
whose NMS value is equal to the NMSS (same as XLSS(1)) value, 
does not require that a copy of the segment record from the 


ere be Made during execution of the rule. Instead, the 
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mientes ide Constituent can use the re segment record as 
the one on the left, making changes as designated by the 
eea eion specifications. To indicate that such a situation 
SS ES Ene instruction code of 47 which was set for the 
Prant Side Constituent is changed to an instruction code of 
b? 3n this case. 

If the constituent currently being processed has any 
creation specifications, then a call to PRCNLB will compile 
ose specifications. After returning from PRCNLB, control 
of processing is transferred to the beginning of the right- 
part processor which will test if there are more constituents 
mene right or if another rule follows. If there are more 
constituents, each is separated in the A-array by a zero- 
value byte. If there is another rule, then control of the 
program is transferred to the PRULES portion which processes 
the next rule. If there are no more rules then a normal 
urn from PRULES is made. 

When returning from PRULES, the sample rule will have 
been compiled and stored in the A-array in the following 


Marner: 


(DONO GIU 8? 160002 EA 
SDE Oo BA GE 02.365 
ZIDE OTDA 18 16009. 18 
TORSE VU OD C9 13 


UD=65B712 2220070070900 
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C. PROCESSING CONDITION AND CREATION SPECIFICATIONS (PRCNLB) 

Whenever a rule or named record definition has infor- 
mation within parentheses, PRCNLB is called to process that 
Nnsenmation. (i.e, translate it into a series of elementary 
commands which can later be executed by TSTCND or CRSEG.) 
The function of PRCNLB within the compiling process is as a 
Backing decision procedure. While performing this function 
PRCNLB repeatedly calls GETSYM which returns a TSCODE and 
TSADDR value for each input symbol recognized by GETSYM. 
PRCNLB, by using TSCODE, decides whether to stack TSCODE and 
TSADDR onto the SCODE and SADDR vectors, respectively, or to 
cause a reduction of the vectors with a call to CODE which 
processes the information in SCODE and SADDR, storing the 
results in the A-array. 

A normal return from PRCNLB occurs when a right paren- 


thesis 1s recognized as an input symbol. 


D. GETTING NEXT INPUT SYMBOL (GETSYM) 

Input symbols recognized by NLP are all the letters, 
digits, and most of the special symbols found on an IBM 29 
card punch. In addition the system will process as symbols 
the six standard arithmetic logicals when between two periods 
(SS ELL), names ot up to eight characters, and numbers. 
Also, names with eight or fewer characters within single 
quotation marks and EBCDIC strings of any length within 
double quotation marks. The special symbols which can be 
= eoomızed. by GETS’M are found in the SCDTAB array and 


indicated by an entry value greater than zero. 
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The function that GETSYM performs within the compiling 
procedure is that of a lexical analyzer. GETSYM collects the 
input characters and translates them to symbol codes 
(TSCODE) with associated values (TSADDR). Appendix E contains 


Est of symbol codes. 


fe PRODUCING CODE FOR CONDITION AND CREATION SPECIFICATIONS 

(GODE) 

The reduction function within the compiling procedure 
nel performed by CODE. When CODE is called from PRCNLB, 
meauction on the content of SCODE and SADDR occurs. When 
mie call is from PRULES, only specified information is entered 
into the A-array. 

All the calls from PRULES use negative arguments. A -l 
enters an instruction code of 47 into the A-array followed 
EN Nee (number of consitituent to be copied) value as set 
EID. A -2 enters into the A-array a zero-value byte 
which separates the constituents of a rule. A -3 enters an 
Un SEruction code of 51 into the A-array. This is the in- 
struction code for setting the contextual constituent type. 
The entry is followed by the NUMB value as set by PRULES. 
For an argument of -4 an instruction code of 194 is created 
which has the purpose of setting aside two bytes in the 
A-array into which will be entered a segment type address 
(all segment type addresses use two bytes of storage). The 
gcsorgument has the effect of entering the value 255 (FF 


hexadecimal) into one byte of the A-array. 
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The A-array iS initialized by NLP MAIN from a specified 
input file, or internally with all array elements set to 
EN utormatron which xs to be stored by CODE is con- 
tained in the variables INST and XADDR. These are INTEGER*2 
variables which must be separated into single byte components 
because information is stored into the A-array in byte 
format. Figure 5A shows how EQUIVALENCING is used to 
separate INST and XADDR into their respective XSADR byte 
components.  XSADR is a LOGICAL*1 array of eight elements. 
XADDR1 and XADDR2 are INTEGER*2 variables whose values are 
Wewally equal to the first and second bytes of XADDR. When 
CODE has determined which bytes of XSADR are to be stored 
into the A-array, then the contents of those bytes 1s 
transferred into XA array. Figure 5B shows the EQUIVALENCING 
of XA and XAWORD, and an A-array element A(XATWRD) as it 
relates to XAWORD. XA is a LOGICAL*1 array of eight elements. 
It receives the byte values to be stored from XSADR, and 
through XAWORD (one eight-byte word) enters one word of 
information into A(XATWRD) (the A-array element pointed 
Eo by the top word used pointer).  XATBYT is a pointer which 
keeps track of which byte in XA is the top byte used. 

As will be seen in the next chapter, to retrieve in- 
formation from the A-array the above procedure is reversed. 
But, before information can be retrieved, pointers to the 
aenıı c dsn array word and byte must be set. The pointers for 
retrieval of information are XATPT and XAPT.  XATPT points 
to the A-array word and XAPT points to the desired byte 


in the XA array. 
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ENST XADDR XADDRI XADDR2 


ey Se ST 


SADR (LXSADR (2 SADR (4 


(POINTER FOR XSADR) 





HOvoE LT XADDRI: XSADR (6) =XSADR (3) 
LO SEL eXADDRZ: XSADR(8)=XSADR (4) 


EQUIVALENCE  (INST,XSADR(1), (XADDR,XSADR(3)), 
popuU Sc AmnR( >) ) , (XADDRZ,XSADR(/)) 


Figure 5A.  Equivalencing XSADR,INST, 
XADDR, XADDRI , XADDR2 


A(XATWRD OR XATPT) 


| XAWORD | 


| XALINK | 
CO FF 
am 


XATBUTSOR AAPT (BYTE POINTERS FOR XA) 





EQUIVALENCE (XAWORD , XALINK, XA) 


Figure 5B.  Equivalencing XAWORD, XALINK, XA 
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When processing information in SCODE and SADDR, CODE 
begins with the last entry in SCODE and eventually ends with 
the first entry. SCODE values, which are symbol codes, 
specify to CODE what meaning input symbols to NLP have. As 
each symbol code is removed from SCODE, a branch is made to 
the fortran statements which process the code. The fortran 
statements within CODE translate the symbol codes into ap- 
propriate instruction codes. A list of instruction codes 
and their meaning is Shown in Appendix F. Once INST 
(variable initially set to an instruction code) and XADDR 
(variable whose content is set equal to the SADDR value 
currently being processed) are properly set, CODE enters 
their respective values into the A-array. Figure 6 shows 
a flow chart of how information is entered into the A-array. 

Before CODE can store information into the A-array å 
determination must be made as to how many bytes of storage 
are required for INST and XADDR. First, the space require- 
ments for XADDR must be determined because the actual in- 
struction code which is finally stored depends on the XADDR 
value. The range of NLP instruction codes is from 1 to 63. 
However, depending upon the number of bytes required for 
XADDR, multiples of 64 are added, extending the range of 
INS rrom l to 255. An EINST value less than 64 signifies 
that no bytes are required for XADDR. An INST value between 
64 and 127 means that one byte of storage is required. A 
value between 128 and 191 means that the two bytes of XADDR 
are equivalent and thus only one byte need be stored. For 


values greater than 191 the two bytes in the A-array following 
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SET VARIABLES INST, XADDR 
XADDR1,XADDR2, I, XI 


IF XADDR=0: P. 


INST=INST+64, IF XADDR1=0 


INST=INST+64, 
IF XADDR1=XADDR2 


INST=INST+64 






INCREMENT XATBYT. IF XATBYT=8 
INCREMENT XATWRD AND SET XAWORD=0.6 4 


ENTER XSADR(I) VALUE INTO 
XU ZATBYE) ELEMENT 











IF INST<192, I=I+1 
IF INST <64, I=I+1 


A 


TETAS LESS THAN 4 






OTHERWISE AIF INST=1|10 
(TNSTRUCTION CODE 46) SAVE. LOCATION 
XADDR FOR FUTURE BACKSTUFFING 


IF XADDR HAS A NEGATIVE VALUE THEN 
BACK SLURP AF OPNTER 


Pigure v6.) Storage Procedure for the A-Array 


go 





the INST value contain the XADDR value. The one exception 
is an INST value of 255 which does not have any associated 
XADDR value. 

An INST value of 46 also requires some special proces- 
sing. XADDR for such an INST will require one byte of 
storage, but the XADDR value to be stored is not available 
Esche time INST 46 is put out. Thus, INST value 46 is 
coded to 110 and a zero-value byte is entered for the XADDR 
value. The A-array location of this zero-value byte is set 
by the value of XCURNT (contains the current absolute byte 
address in the A-array) and is saved in the ATSTOR (attribute 
store) array. The SCODE value is changed to 7 and the 
respective SADDR location is set to the negative value of 
ATI (current ATSTOR subscript). When the proper XADDR value 
becomes available, the zero-value byte address stored in 
ATSTOR is retrieved and a pointer value entered at the byte 
address location. The value which is backstuffed is a 
pointer to an A-array location where the proper XADDR value 
associated with INST value 46 can be found. The pointer value 
is the relative number of bytes from the zero-value byte to 
where the XADDR value is stored. An (oes of when INST 


value 46 occurs is: 
(@N=5) 


The "@" symbol signifies that the attribute whose number is 


in N be set equal to 5, 
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F. PROCESSING OF NAMED RECORD DEFINITIONS (PRNREC) 

Named record definitions look very similar to the con- 
stituents of a rule and are processed in much the same 
manner, except that PRNREC is used instead of PRULES. The 
main difference is that the code generated from the creation 
specifications is executed immediately by CRSEG, rater than 
Simply left in the A-array to be used later. Figure 7 shows 
in flow chart form how named record definitions are proces- 
sed. A detailed discussion of named record definitions can 
be found in Ref. 4. 

PRNREC collects the name and creates a record in the 
array. XATPT and XAPT are respectively set to the top 
eremlable word and byte in the A-array. When a left-paren- 
thesis 1s encountered the information in parentheses is 
processed by PRCNLB. 

hementersticeattribute specifications into CELL array, 
PENREC calls CRSEG which executes the code just compiled 
for the named record definition and stored in the A-array, 
as will be discussed in the next chapter. After returning 
from CRSEG the next named record definition can be processed, 
and the content of the A-array for the previous named record 
definition is no longer needed, and can be erased. This is 
accomplished by resetting XATWRD to the value which it con- 
tained when entering PRNREC, and resetting XATBYT to zero. 
tiaseallows the Next named record definition to use the same 
A-array storage locations that the previous named record 


definition used. 
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ENTRY PRNREC 


fe nomen READ OPTIONAL DATA 


IF NEXT CHARACTER IS END OF FILE 
THEN RETURN 


OTHERWISE, GET NAME OF NAMED 
RECORD 





ee Se ee A 


| CREATE A RECORD FOR THE NAME | 
LOCATE LEFT PARENTHESIS 


SR cA eT TO FIRSI AVAILABLE 
A-ARRAY ELEMENT 







PROCESO ATTRIBUTE SPECIFICATIONS 


CALL CRSDG TO EXECUTE TEESCOMPILED 
AUNPRIBOTDEOSPECUBETICATIONS 


me 


RESET POINTER TO FOPIA- ARRAY 
ELEMENT USED 


IF TR6 THEN PRINT SEGMENT RECORD 





Figure 7. Processing of Named Record Definitions 
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Gen THE NEXT CHARACTER FROM THE INPUT 
STREAM AND CONSIDER IT A SEGMENT OF 
MEX Type. 06 IT IS ":EOF:" THEN 
RETURN. 


FOR EACH RULE BEGINNING WITH THAT TYPE 
Of SEGMENT CREATE A PARTIAL 
CONSTITUENT Bist. 










fie PARTIAL. CONSTITUENT LIST 
BeGoMES A] COMPLETE CONSTITUENT LIST, 
ie oleae en SEN eTHr. bist OF COMPLETE 
GP ELUENT ETSUTS. OTHERWISE, FILE 
Tee RT PEN CONSTITUENT LIST IN THE 
SEGMENT TYPE RECORD OF THE NEXT 
CONSTITUENT THAT THE PARTIAL 
CONSTITUENT LIST IS WAITING FOR. 







IF THERE ARE ANY PARTIAL CONSTITUENT 
iS ES NATT ING POR AN INPUT SEGMENT OF 
THAT, TYPE, AND FOR WHICH THE 
SEUETETED CONDITIONS ARE MET, ADD 

fe eee tito Ne TO THE PARTIAL 
CONSTLEUENT LIST. 


cP ARB O NO. COMBEETE CONSTITUENT 
ENS TS THEN RETURN. TO GETTING THE 
NEXT INPUT CHARACTER: 


OME RWaS hoe TAKE tHE SOP COMPLETE 


CEO ST ELUENTSBTSTSOFFSTHEBEFSETST OR 
LISTS AND CREATE A SEGMENT RECORD 


ACCORDING LOS THE (RIGHT HAND” SLIDE OF 
THE RULE ASSOCIATED WITH THE 
COME CETETCONSTITUENI LIST: 





Figure 8. Decoding Algorithm 
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IDE CODE 

When NLP main recoginzes the command DECODE: (or 
MENTE), routine DECODE is called to process the input text 
which follows. But before the input text is read, DECODE 
enters a period (.) and a blank (signified by #) at the 
front of the input stream. This has the function of setting 
üp the proper variables which will expect the input text to 
be the start of a new sentence. 

Before any text processing is performed, DECODE will 
request optional data if switch TR6 was set to true, allow- 
ing the user to set the program parameters and switches as 
in -ted in Appendices B and C. 

To begin processing, DECODE removes from the input 
stream the first Character. At this time the input stream 
contains a period and a blank. Decode processes the two 
characters and thereby sets the stage for processing input 
rest, After the first line of text is read the input stream 


may have the following characters for example: 
.#VEHICLESFARRIVE#FAT#FA#FSTATION. 


DECODE obtains the next character from the input 
Stream and considers it to be a segment of that type. Then 
NEWSEG and ADDSEG are called to process the character, pro- 
viding DECODE with a list of complete constituent lists. 
These complete constituent lists contain all the instances 


of the decoding rules which have had their left-part conditions 
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satisfied with the appearance in the input stream of this 
particular character. If the list is empty, then the next 
input character is processed. Otherwise, DECODE removes the 
sample te constıtuent lists from the list of lists, one at 
a time, and creates segment records as specified by the 
right part of the rule whose conditions were met. 

The rules which are to be executed are stored in the 
A-array as they were compiled, and must be retrieved by a 
process which is essentially the reverse of that discussed 
EuNSectuion III.E. The complete constituent list contains 
BRCSXATPT and XAPT values which point to the byte in the 
A-array where the information concerning the right-part of 
a rule begins. The first two bytes retrieved contain the 
segment type record address (STYPE value) of the first con- 
stituent on the right. The execution of the creation spec- 
ifications (information in parentheses for segment types on 
ønenright-part of a rule) is performed by CRSEG which is 
celda trom NEWSEG, which in turn is called from DECODE. 
Heme ach constituent on the right side of the rule, DECODE 
sets the STYPE variable and calls NEWSEG. If while proces- 
sing the right-part of a rule the value 255 is encountered, 
the next entry on the list of complete constituent lists is 
processed. When all such entries have been processed, the 
next character from the input stream is obtained and proces- 
sed. When an end of file symbol (:EOF:) is encountered 1n 


Ec nputwestrecam DECODE returns control to NLP MAIN. 
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Zea NEWS EG 

When NEWSEG is called, 1t has the function of in- 
vestigating the new segment type under consideration. This 
MnveStigaton Consists of locating all the rules which begin 
with a segment of this type. A pointer to the first such 
rule is obtained from the APFCR attribute of the segment 
type record, and each rule points to the next one, as will 
De described below. Each rule on the list has its first 
enie ti tuent condition specifications checked by TSTCND. If 
the condition specifications are met, NEWSEG creates a partial 
constituent list, which serves as an indicator of what state 
en e e cognition a particular instance of a rule is in. 

ADDSEG will determine if the status of the rule is to "wait" 
for another constituent or to "yield" a new segment type 
Coa the right). 

After returning from the call to ADDSEG the next rule 
beginning with a segment of the current type is processed. 
NXTRL contains the address of the next rule to be processed. 
NXTRL obtains its value from the XALINK (link to the next 
rule of same segment type) field of the previous rule. When 
NXTRL equals zero, the end of the list has been reached. 

Once the list of rules beginning with a segment of the 
current type is exhausted, the partial constituent lists 
waiting for a segment of that type are processed. First, the 
list of those waiting for a segment of the current type are 
located. Then, only those for whom the segment occurs in the 


proper position in the input are processed. If the condition 
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Ser eıtleatıons for this constituent are met, the segment is 
added to the partial constituent list by a call to ADDSEG. 
3. | ADDSEG 
With each segment to be added to a partial constituent 

list ADDSEG is called.  ADDSEG determines if the partial 
constituent list should be filed in the list of complete 
constituent lists (if there are no more constituents in the 
rule), or if a pointer should be filed in the segment type 
record of the next constituent of the rule. The second 
process has the effect of signaling for which input segment 
a rule is waiting. If the input segment satisfies the last 
condition of a rule then the partial constituent list be- 


comes a complete constituent list and yields a new segment 


type. 


B. ENCODING PROCESS 

The encoding process provides the user with output for 
some given input. If the input was a natural language 
description of a queuing problem, as is the case for NLPO, 
the output could be a GPSS program. The execution of the 
encoding rules specify what the output will be. 

Hadise: omc flow chart of the encoding algorithm. 
The algorithm is executed when the encoding command is 
recognized. The form of the encoding command can be a 
sentence describing what encoding action is to be performed. 


Two examples are: 
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PUTA TN TETAL SEGMENT AND SEGMENT TYPE 
POINTS. ON STACK OUTLST. 


EENOVEZTOPZSEGMENTTYPE POINTER FROM 
STAR NELS NA TERMINAL SEGMENT 
PE eu tS NAME IN OUTPUT STREAM 
PrecALLinG OUTCHR. 


OTHERWISE, REMOVE SEGMENT 
POMPE FROM STACK . 





Te AORASEEGMENT TYPE IS "OUTPUT", 
PROCESS ASSOCIATED INFORMATION. 


PND AN ENCODING RULE TO APPLY. 


PEACE SEGMENT TYPE POINTER AND 
SeCMeNrT POINTER FROM THE RIGHT SIDE 


SE SKRUR ONTO -A TEMPORARY LIST, 





= (00 UU cU OH ES = 0 a 


PLACE POINTERS FROM TEMPORARY 
LIST ONTO THE STACK OUTLST IN 
REVERSE ORDER 


mee SlTACK Io EMPTY >. PORCH OUTPUT OF 
LAST LINE IN OUTPUT BUFFER AND 
BATTS ENCODE. 


== OTHERWISE, LP eo LACKS eS NOT EMPTY 


Pitre 9 “Encoding Algorithm 
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Describe the problem in English. 


rec epa gram for this problem. 


The "sentence" commands are processed by the decoding 
or ithim, wnich then Calls ENCODE after recognizing what 
action is to be performed. 

ENCODERS called with two arguments, STYPEE and SGMNTE. 
STYPEE is a segment type pointer and SGMNTE is a segment 
pointer. First, SGMNTE is entered on the OUTLST stack and 
then STYPEE is entered. Next, a segment type pointer is 
removed from the top of the stack. If the segment is a 
terminal segment type, its name is placed in the output 
stream with a call to OUTCHR. Otherwise, a segment pointer 
is removed from the top of the stack. If the segment type 
name associated with the segment pointer is "output" then 
more information is usually entered into the output stream. 
If the name is not "output", then testing the left-part 
conditions of encoding rules beginning with such a segment 
type name begins. If a rule has its conditions satisfied 
then the segment type pointer and a segment pointer of the 
first constituent on the right are placed on a temporary list. 
Then the pointers from the temporary list are placed onto 
OUTLST, in reverse order. 

When all the entries on the OUTLST stack have been 
processed, the output of the last line in the buffer is 


forced out and a return from ENCODE is made. 
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C. TESTING CONDITIONS AND CREATING RECORDS (TSTCND AND CRSEG) 
howemmes TSTCND and CRSEG share the same fortran code. 
The difference between the two routines is that CRSEG initial- 
izes the CNLB (condition or label) variable to q and obtains 
an address for SEGMNT if there is none. TSTCND sets CNLB to 
mein function, the two routines are very different.  CRSEG 
executes compiled rules creating segments as specified by the 
creation specifications of the right-part of a rule.  TSTCND 
tests the condition specifications on the left-part of a rule. 

CRSEG is called from NEWSEG, ENCODE and PRNREC, while 
MSTEND is called from NEWSEG and ENCODE. The instruction 
codes processed by CRSEG and TSTCND are listed in Appendix F. 

For the remainder of this section, a reference to CRSEG 
will also imply TSTCND unless specifically stated as being 
otherwise. 

When CRSEG is called, all the instruction codes to be 
executed are contained in the A-array. Thus the first function 
of CRSEG is to retrieve the INST and XADDR values from the 
A-array. Once ATC (equal to INST+1) and ADC (equal to XADDR) 
Ao ne Ss pectic ans truction codes can be executed by 
the remainder of CRSEG. 

Before CRSEG is called, XATPT and XAPT must be set to 
point to the proper byte in the A-array where interpretation 
is to begin. INST is set equal to the information contained 
in the first byte to be processed. Each time INST is set, it 
is tested for a zero value (separator between constituents) 


Or a value of 255 (represents the "arrow" or the end of a 
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rule). If either of these values is encountered, processing 
of instruction codes ceases, and a return is made (the one 
exception being if XI (transfer variable) is set to 3). 
Otherwise, each INST and associated XADDR is processed. 
CRSEG reverses the storage procedure for INST and XADDR 
as performed by CODE. In CRSEG the INST value obtained from 
the A-array specifies how many bytes following INST must be 
processed to get the proper XADDR value. EQUIVALENCING as 
shown in Figures 5A and 5B, and as already explained, is 
used to process the A-array content. Once the proper INST 
and XADDR values are obtained, ATC and ADC are set. The 
ATC value determines where within CRSEG control is trans- 
ferred to process the particular INST code. After each 
INST code and XADDR have been processed, the next INST is 
obtained from the A-array. This is performed by setting I 
(subscript variable for XSADR) equal to 2, XI equal to 1 and 
extracting the next byte from the A-array. This process 
continues until an INST value of 0 or 255 is encountered. 
For TSTCND, either of these ¿instruction codes (0 or 255) 
gell cause a return from the routine. For CRSEG, if the 
calling argument was not 0, m final entries for the newly 
created segment record are made and then a return is made. 
There are instances during the execution of CRSEG when 
a return is signaled before an INST value of 0 or 255 is 
encountered. Also, during decoding the situation can arise 
when it is desirable to "skip" to the end of parentheses for 
a constituent being processed. If either of these conditions 


is encountered the EXECSW (execute switch) is set to false. 
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ERE Sto Lalse causes CRSEG to continue retrieving 
INST and XADDR values until A (O0 or 255 INST value is en- 
countered. The instruction codes retrieved with EXECSW set 
to false will not be executed except for INST value 51. The 
result of such processing is to exit CRSEG with XATPT and 
fer pointing to the proper A-array byte for the decoding or 


encoding process to continue. 


Ne RESULTS 


To be able to determine the effectiveness of implementing 
the A-array, the sample NLPQ terminal sessions described in 
Ref. 3 were repeated. Also, the decoding, massager, English 
encoding and GPSS encoding rules were compiled, and the named 
record definitions were processed for the comparisons. 

Implementing the A-array had the primary objective of 
reducing the er requirements for the rules used with NLP. 
able 1 contains the results of comparing the old and new 
Methods Of storing NLPO rules. SETI, SET2, SET3, and SET4 
in Table 1 respectively represent the four sets of rules 
mentioned above. The leftmost column of Table 1 has seven 
entries representing the old and new storage requirements as 
related to NLPQ rules. 

The first three rows of Table 1 contain the number of 
storage elements required by the old storage method. The 
next three rows represent the rule storage requirements with 
the new method. Row 7 shows the number of 8-byte elements 


saved with the new method of storing NLPQ rules. The first 
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OLD 


NEW 


STORAGE SPACE 
RECORDS (CELL ARRAY) 
RULES (CELL ARRAY) 


TOTAL ELEMENTS 


RECORDS (CELL ARRAY) 


RULES (A-ARRAY) 


TOTAL ELEMENTS 


SAVINGS 


Table l. 


SETE 


802 
6515 


7317 


802 


1354 


2156 


5161 


RULES 


SETZ 


GZ 


772 


934 


VEZ 


Pe) 


343 


>91 


SET> 


368 


3049 


3417 


368 


L39 


1103 


2314 


SET4 


491 


1843 


2334 


491 


457 


948 


1386 


Old and New NLPQ Rule Storage Requirements 


(Numbers Represent 8-Byte Elements) 
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TOTAL 


1923 


72,79 


14002 


1823 


ZN, 


4550 


J452 





Ancud cows contain the number of CELL array record 
elements that result from compiling the rules. These ele- 
ments are primarily for segment type records and named 
records not previously defined. Row 2 shows the number of 
CELL array storage elements the NLPQ rules required under 
the old storage method. The addition of values from rows 1 
Mies the total CELL array requirements for the old 
method, and these values are shown in Row 3. The fifth row 
shows the new storage requirements when the rules are stored 
in the A-array. The sixth row shows the total number of 
8-byte elements required using the new scheme. The savings 
of 8-byte elements accomplished for rule storage are shown 
in row 7. These values are the differences between rows 2 
and 5. The percentage of 8-byte element savings for the 
rules alone is 78% and the percentage for the total number 
of elements saved is 67$. 

The secondary objective of this thesis was to reduce the 
amount of paging performed by CP/CMS while executing NLP. 
Such a reduction is reflected in the actual CPU times shown 
Moena DE 2 Zen reduction in the virtual CPU time was also 
achieved, as can be seen in the Table, probably due to less 
list processing. Columns 1 and 2 show the virtual CPU times 
for the old and the new methods and columns 3 and 4 show the 
actual CPU times. 

Piewuppeiepast OF then table lists the. CPU times taken 
for compiling of NLPQ rules and processing the named record 
definitions. The total times for these two are shown in row 


7. The values show that the new method saved 31 seconds of 


49 





Piviior (iN SECONDS} 


VIRTUAL ACTUAL 
Cle? IME CPU LIME 
OLD NEW OLD NEW 
NUBES: 
SETI 46 37 108 64 
ser 153 9 28 18 
SET 3 32 24 86 54 
SET4 25 16 65 38 
TOTAL FOR RULES ; 1:156 86 297 174 
NAMED RECORD DEFINITIONS 16 i 36 30 
TOTAL COMPILING 132 107 SPD 204 
SAMPLE PROBLEM: 
DECODING 142 133 315 218 
ENGLISH ENCODING 25 26 76 41 
GPSS ENCODING 24 22 64 48 
TOTAL EXECUTING 291 181 455 307 


Table 2. Virtual and Actual CPU Times for the Old and New 
Method for Processing the Sample NLPO Problem, 
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virtual CPU time and 121 seconds of actual CPU time. The 
respective savings percentages accomplished are 24% and 37%. 
The lower part of Table 2 shows the times required for 
executing the NLP program with the sample problem as input. 
Row 8 shows the times for decoding the problem into the IPD. 
Row 9 shows the times for encoding the Internal Problem 
Description (IPD) into an equivalent English problem descrip- 
tion. Row 10 shows the times for encoding the IPD into a 
GPSS program. Row 11 shows the total execution times for 
both the old and new methods. The virtual CPU time savings 
is 10 seconds and the actual CPU time savings is 138 seconds. 


The respective percentages of time saved are 5% and 31%. 
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VI. CONCLUSIONS AND RECOMMENDATIONS 


The eventual implementation of a general purpose natural 
language processor is inevitable. Such a processor may not 
EB -neral in the sense that any input statement will be 
processed properly, but at least it will be very flexible 
within its specific application. NLP is a significant ad- 
vance in this area of natural language processing, and NLPQ 
is a working example of natural language man-machine inter- 
action using NLP. 

As such processors become more flexible, it can be ex- 
pected that they will require more decoding and encoding 
rules. The storage scheme developed and discussed in this 
thesis has significantly reduced the amount of core storage 
needed to store these rules, as discussed in Section V. As 
more rules are added, the flexibility of specific NLP ap- 
plications will increase. The associated time savings, also 
discussed in Section V, make NLP more responsive to the user, 
in addition to reducing the computer cost when operating the 
SNS em. 

For further reductions of storage requirements it is 
recommended that techniques developed in this research be 
applied to the storage of other information in NLP, such as 
partial constituent list records and segment type records. 
Also, if NLP continues to be used with a time sharing system, 
work should be done to reduce the amount of paging performed. 


Such a reduction would make NLP even more responsive to the user. 
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COMMAND 

EUN RIBUTES: 

CONVERSE ; 

DECODE: 

DELETE: 

ENCODE : 

END: 

INDICATORS: 

LEXOLOGY : 

LEXOLOGY FOR ENCODING: 


MAXLN: 


MORPHOLOGY: 


MORPHOLOGY FOR ENCODING: 


NAMED RECORDS: 

OPDATA: 

PRINT: 

ROUTINES: 

SEMOLOGY: 

SEMOLOGY FOR ENCODING: 


SEITRLEV: 


SIMULATE: 
TEXT: 


DEREN: 


APPENDIX A 


NLP COMMANDS 


DEFINITION 

process attribute names 

perform a special kind of encoding 
perform decoding of text 

delete specified rules 

perform encoding 

terminate NLP program 

process indicator names 

process lexological decoding rules 
process lexological encoding rules 


Print number sor GELB arrayreiemenrs 
used for NLP processing 


process morphological decoding rules 
process morphological encoding rules 
process named record definitions 
read optional data 

print specified information 

process routine names 

process semological decoding rules 
process semological encoding rules 


set trace level of specified seg- 
ment type 


Call the simulation Subroutine 
perform decoding of text 


pre INNE of cell sarrayzelements 
used for storage 
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APEBNDIX B 


PARAMETERS 


(Default Values are given in parentheses) 


ENCNMS (0.0) segment type to be encoded during decoding 

OUTFLA (6) output file for printed encoded text 

OUTFLB (Moo tale for encoded text to be punched 

OUT6 (6) output file used for most output 

PRCNMS Men) segment csyoe to print constituent structure 

PRSLVL Ce per record printout, used with PRSNMS 

ERSLVT TE eoe oi record printout,) used with PRSNMT 

PRSNMS Voe ment type to be printed during decoding 
or encoding 

PRSNMT (0.0) segment type to be printed during decoding 
Or encoding 

RTERM (5) file number for terminal input 

TRINDX MENS On index ati which optional 


Gate 1s to be entered 


ERLCEVS (100) trace level of segment types to be traced 
during decoding 


WTERM (6b £llewnumber stor terminal output 
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CHGIND 


NEEPCL 


NOPURG 


PRTSW 


SAME 


TRACE 


TRADSG 


TRGIND 


TRNS 


ERNS? 


TRSENT 


TRWORD 


TRI 


TR2 


TR3 


TR4 


TR5 


TR6 


APPENDIX C 


SWITCHES 


(Default Values are all 'false') 


read indicator changes in ENCDSG routine 


keep constituent list structure 


do notecdesany purging 


print each line on OUT6 as it is read 


read more than one logical file from a physical file 


print TRMPNT and NAMEX arrays when writing binary file 


trace ADDSEG routine 


read optional data in ENCDSG routine 


trace NEWSEG routine 


trace NEWSEG routine 


print sentence numbers 


print word number 

trace switch, depends 
trace switch, depends 
trace switch, CCS 


trace switch, depends 


on routine 
on routine 
on routine 


Sn“TonLıne 


peintenrecordsıdentirie22ion numbers 


read optional data at the beginning of some routines 
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APPENDIX D 


Aer Bora egmemtertype record 


APFCR 1 A list of pointers to decoding rules which 
have this segment type as their first 
constituent 


APPCL 2 A list of pointers to partial constituent 
lists which are waiting for a segment of 
this type 


AUCL 3 Sp c#r eae numberof Characters in a 
segment of this type, if it is unique. 


AMCL 4 Special information (gets its value from the 
number after a star on the right side of a 
decoding rule). 


ARULES 8 A list of encoding rules which have this 
segment type on the left. 


ATRLEV 9 The "trace level" specified (optionally) 
during the processing of rules. 


ANMS 10 The EBCDIC representation of the name of this 
Sequene type: 
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APPENDIX D 


DDr more Segment type record 


APFCR Å Zee Ooi pointers to decoding rules which 
have this segment type as their first 
constituent 


APPCL 2 AS O pointers LO partial constituent 
lists which are waiting for a segment of 
this type 


AUCL 3 ese number of Characters in a 
Segment of this type, 1f 1t.1s unique. 


AMCL 4 Special information (gets its value from the 
number after a star on the right side of a 
decoding rule). 


ARULES 8 A list of encoding rules which have this 
segment type on the. ert: 


ATRLEV 2 The "trace level" specified (optionally) 
düring the processing of rules. 


ANMS 10 The EBCDIC representation of the name of this 
Seoment type. 


56 





RSCODE 


10 
PI 
T2 
13 
14 
15 
16 
17 
l8 
19 
20 
21 


22 


SYMBOL 


APPENDIX E 
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ovis CODES 


MEANING 
Numeric Constant 
EBCDIC haracter String 
Named Record Name 
Indicator Name 
Literal Record Name 
Routine Name 
Literal Attribute Name 
Element Separator 
Logical Not 
Copy 
Assignment 
Addition 
Subtrser ion .iyeletion 
Multiplication, (also Star) 
Division 
Comparison 
"In the set" Test 
"In the set" Value 
Attribute 
Left Parenthesis 
Right Parenthesis 


Or 
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24 


25 


NV NA vil 


58 


Special Symbol 
Less Than 


Greater Than 





24 
25 
26 


27 


28 


29 


30 


31 


ABPENDEX E 


INSTRUCTION CODES 


Meaning 

No operation (Separates constituents of a rule) 
Set record pointer to attribute value 
SCumdeew@lellcounumoer, tor indirect specification 
Get value from an attribute 

Set an attribute to a value 

Seeman indicator to a value 

puritan indicator on 

Test for indicator on 

Nestor indicator off 

Mars on indicator off 

Get a Value from an indicator 

Set indicator from a named record 

Test for indicator on in a named record 
testor indicator off in a named record 

Set record pointer to named record 

Get a pointer to named record for value 

Set attribute l to point to a named record 


ese tom aueretoute, Pepointing to a particular 
named record 


Pest Orrar Ipute lL amot pointing tOo a particular 
named record 


Get record pointer for value 
Set record pointer for literal record name 


Get value from numeric constant 
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32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 
51I 
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255 


boob 3 


Get value from EBCDIC character string 
Copy attribute 

Test for presence of attribute 

Test for absence of attribute 

Test for set membership 

Test for lack of set membership 

Get copy of record for value 

Make segment be a copy of specified record 
Erase an attribute 

SUDE CE 

Add 

Divide 

Mua s 

Test for specified comparison 

Seema tribute number, for indirect specification 
Make segment be a copy (automatic) 

Call a specified routine 

Or 

Find attribute value ina set 

set contextual constituent variable 


Use same segment record instead of making an 
automatic copy 


Not used 


No operation (Signifies the ending of the left 
Pare or reiche part Of ae rule) 
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